Skip to content

DOC: Docstring additions for min_itemsize #62067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

JoeDediop
Copy link

Problem Summary

The current pandas documentation for min_itemsize in HDFStore methods doesn’t clearly explain that it refers to byte length, not character length. This causes confusion when working with multi-byte characters.

Proposed Addition to HDFStore.put() and HDFStore.append() docstrings

Add this clarification to the min_itemsize parameter description in the appropriate methods:

min_itemsize : int, dict, or None, default None
    Minimum size in bytes for string columns when format='table'.
    int - Apply the same minimum size to all string columns, 
    dict - Map column names to their minimum sizes or, 
    None - use the default the sizing
    Important: This specifies the byte length after encoding, not the 
    character count. For multi-byte characters, calculate the required
    size using the encoded byte length.
    See examples below for use.

And adding this to the example section for each docstring:

    Examples:
    - ASCII 'hello' = 5 bytes
    - UTF-8 '香' = 3 bytes (though only 1 character)
    - To find byte length: len(string.encode('utf-8'))

Why This Helps

@mroeschke
Copy link
Member

AI-generated pull requests are not welcome in this project, closing

@mroeschke mroeschke closed this Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants